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1 SUMMARY 

As a demonstration of the 
performance capabilities of trellis codes 
using multidimensional signal sets, a 

Viterbi decoder for one of the codes in 
[1] was designed. The choice of code was 
based on two factors. 

The first factor was its 
application as a possible replacement for 
the coding scheme currently used on the 
Hubble Space Telescope (HST). The HST at 
present uses the rate 1/3 v = 6 (with 

2 V = 64 states) convolutional code with 
BPSK modulation. With the modulator 

restricted to 3 Msym/s, this implies a 
data rate of only 1 Mbit/s, since the 

bandwidth efficiency K = 1/3 bit/sym. 
This is a very bandwidth inefficient 

scheme, although the system has the 

advantage of simplicity and large coding 
gain. 

The basic requirement from NASA was 
for a scheme that has as large a K as 
possible. Since a satellite channel was 
being used, 8PSK modulation was selected. 
This allows a K of between 2 and 3 
bit/sym. The next influencing factor was 
INTELSAT’s intention of transmitting the 
SONET 155.52 Mbit/s standard data rate 

over the 72 MHz transponders on its 
satellites. This requires a bandwidth 
efficiency of around 2.5 bit/sym. A 
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Reed-Solomon block code is used as an 
outer code to give very low bit error 
rates (BER). 

The 16 state rate 5/6, 2.5 bit/sym, 
4D-8PSK trellis code from [1] was 

selected. This code has reasonable 

complexity and has a coding gain of 

4.8 dB compared to uncoded 8PSK [2]. This 
trellis code also has the advantage that 

it is 45° rotationally invariant. This 
means that the decoder needs only to 

synchronise to one of the two naturally 
mapped 8PSK signals in the signal set. 

2 ENCODER IMPLEMENTATION 

At first, a systematic encoder was 
used in the design. However, it was found 
that in designing a Viterbi decoder, it 
would be simpler if a non- systematic 
convolutional encoder was used. This is 
because the state transitions in a 
non-systematic encoder are highly 
structured, compared with the almost 
“random” transitions of a systematic 
encoder. 

To convert the systematic encoder 
to a non-systematic form, the technique 
described in [3] is used. This method 
uses the fact that the impulse response 
of each shift register in a 
non-systematic encoder will produce 
output sequences that are equivalent to 
the generator polynomials. Since a 
systematic encoder must also produce the 
same sequences, it is relatively easy to 
find E linearly independent output 
sequences from a systematic encoder that 
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can be used as generators of a 
non-systematic encoder. 

There is usually more than one set 

of possible generator polynomials. The 

polynomials are chosen so that the inputs 
x‘(D) and x 2 (D) are affected by a 45° 
phase rotation in the same way as in a 
systematic encoder. Thus, the 

differential encoder for the systematic 
code can also be used for the non- 

systematic encoder. The non-systematic 
encoder equations that were found for the 
4D-8PSK code are 

z 2 (D) = x 2 (D) © (D 2 © l)x'(D), (la) 

z'(D) = D 2 x 2 (D) © (D 2 © D © l^CD), (lb) 
z°(D) = Dx 2 (D). (lc) 

Figure 1 illustrates the new 
non-systematic encoder. After a 45° phase 
rotation, we have z 2 (D) = z 2 (D), 

zJ(D) = z^D) © 1(D), and z*(D) = z°(D). 

Rotating the equations in (1) gives 

x 2 (D) = x 2 (D) and x|(D) = x'(D) © 1(D), 

the same as for the systematic encoder. 

The encoder uses a Phase Locked 


Loop (PLL) to generate the two times 

clock for transmitting the two 2D 

symbols. This PLL is based on the 

74HC4046 Integrated Circuit (IC). The 

encoder is able to accept data either 
serially or in five bit bytes. 

3 DECODER IMPLEMENTATION 

Due to the complexity of the 

decoder design, only a brief description 

is given here. As such, only the 
important design decisions are described. 

To reduce the cost of the codec, a 
serial implementation of the decoder was 
chosen. That is, one clock cycle would be 

required for each state of the code. 

Since there are 16 states, at least 16 

clock cycles are required to process each 
received 4D point. As will be described 

in more detail later, an extra seven 

clock cycles are required for start-up 

purposes. Thus, a total of 23 clock 

cycles are required for each iteration of 
the Viterbi algorithm. 

The technology and clock speed in 
our design is the same as used in another 
Viterbi decoder designed by the author 

[4]. This gave us greater confidence that 
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Figure 2: Block diagram of a Viterbi decoder for the 16 state 2.5 bit/sym 4D-8PSK trellis code. 


the design would work, even though the 
actual design is twice as complicated. 
Our design uses a 10 MHz clock (giving 
100 ns clock cycles) and Schottky TTL 
logic for its ease of use and large 

variety of functions. The actual 
technologies used are 74LS (Low-power 
Schottky TTL) for non-time critical 
sections of the circuit and 74F (Advanced 
Schottky TTL) for time critical sections. 

Other technologies are used for functions 
not available in 74F or 74LS. 

The decoder is operated 

asynchronously to the received data 

clock. This requires one of the seven 

extra clock cycles described above. 
Internally, the decoder operates 
synchronously to the 10 MHz clock. The 

decoder starts operation after detecting 

the first rising edge of the received 4D 
symbol clock. After 23 clock cycles, the 
decoder stops and waits for the next 

rising edge of the 4D symbol clock. This 
allows the decoder to operate at any data 
rate from 0 to 2.1 Mbit/s. 

Each iteration of the Viterbi 
algorithm decodes five bits for each 
received 4D signal point (since the code 

rate is 5/6). The maximum 4D symbol rate 
of the decoder is the internal clock 
speed divided the number of clock cycles 
required to decode the five bits, i.e., 

4.35xl0 5 4D symbols per second. 
Therefore, the maximum bit rate of the 
decoder is 2.17 Mbit/s. For the HST, this 
code could achieve a data rate up to 7.5 
Mbit/s. For actual use on the HST, it is 


intended that the decoder would be 
implemented on a VLSI chip, where the 
required decoding speed would be 
achieved. 

There are six main sections in the 
Viterbi decoder. These are 

• Branch Metric Calculator (BMC) 

• State Metric Calculator (SMC) 

• Survivor Sequence Memory (SSM) 

• Signal Set Synchroniser (SSS) 

• Minimum State Metric Selector (MSMS) 

• Branch Point Selector (BPS) 

Figure 2 illustrates a block diagram of 
the decoder. The above sections are 

described as follows. 

3.1 Branch Metric Calculator 

For each transition of the trellis 
there are 8 parallel paths (due to the 

three unchecked bits in the encoder). The 

BMC must determine which of the paths is 

closest to the received 4D signal point 
(the Branch Point (BP)) as well as the 

Branch Metric (BM) for this path. The BM 
can be calculated in a number of ways. 
The optimum BM’s for AWGN channels with 
quantisation are log- likelihood metrics 

[4]. Alternatively, one could make an 
approximation based on the squared 
Euclidean distance between the received 
point and the points along the 
transitions. 

In our design we have chosen to use 
Read Only Memory’s (ROM’s) to store the 


77 








precalculated BP (three bits are used to 
represent each parallel path) and BM 

(based on log-likelihood metrics). The 
encoder can produce one of eight (i.e., 

2 k+I ) sets of parallel paths (each 
containing 8 paths). The BP and BM must 
be determined for each of these eight 
sets of parallel paths. 

We have chosen four bits to 
represent the BM value. This gives a BM 
range from 0 (closest to the received 4D 
point) to 15 (furthest from the 4D 

point). Decoder simulations in [5] for 
another multi-D trellis code indicate 
that this amount of quantisation results 
in little performance degradation. 

To minimise the number of address 
bits to the ROM, each received 2D signal 
point has been quantised to seven bits. 

After extensive simulations in [5] for a 
6D-8PSK trellis code, it was found that 
pie-chart or angular quantisation results 
in the least performance degradation (0.2 
to 0.3 dB for five bit quantisation). The 
simulations included the “dartboard” 
quantisation pattern proposed in [1]. 

Each ROM therefore has an address 
space of 14 bits (seven bits for each 2D 
symbol). The ROM’s used for the BMC are 
32Kx8 27C256’s. A total of 6 ROM’s were 
used, two for determining the BP’s and 

four for the eight BM’s. 

Alternative BMC schemes which 
exploit the finite length trellis 
structure of the parallel transitions 
were also considered. That is, a Viterbi 

like decoder can be used to decode the 
parallel transitions. However, their 
large complexity (in a discrete 

implementation) led us to choose the 
simpler ROM look-up method. For a VLSI 
implementation, though, the trellis 
decoding method would be preferable due 
to the flexibility that VLSI provides in 
designing circuits. Thus, the Viterbi 

decoder (with the BMC) could be 
implemented on a single chip. 

3.2 State Metric Calculator 

The SMC updates the State Metrics 
(SM) for each state of the code in each 
iteration of the Viterbi algorithm. A SM 
is an indication of how close the 

received sequence is to the closest path 


of all paths leading into a particular 
state. Since the code has two checked 
bits, there are four paths leading into 

each state (since we choose the closest 
path among the 8 parallel paths in the 

BMC). For each of the four paths, we must 
add the BM for that path to its 

corresponding SM (also known as the old 
SM) from the previous iteration. The new 

SM for the four paths leading into a 
state is the smallest of these 

summations. This path is selected and all 
other paths are eliminated. This is 

called the Add-Compare- Select (ACS) 

operation. 

With four paths into each state a 
4:1 ACS circuit is required. With 16 

states in our code, the ACS operation 

needs to be performed 16 times 

(explaining the need for 16 clock 

cycles). The ACS circuit also produces 

two Path Decision (PD) bits which 

indicate which of the four paths was 

chosen. This information is passed to the 
SSM where it is stored. 

Since the decoder operates 

serially, only one ACS circuit is 

required. The 16 SM’s are stored in two 
74AS870 dual 16x4 static Random Access 
Memory (RAM) chips. Eight bits are used 

to represent each SM. As shown in [5] for 
a 6D-8PSK trellis code, this is more than 
enough bits when two’s complement 

arithmetic is used in the ACS circuit to 

prevent overflow [4]. Before the first 

new SM can be calculated, four old SM’s 
are read out from the RAM’s. This takes 

four clock cycles. It takes another two 

clock cycles to perform the ACS 
operation. To achieve a slightly higher 
speed, we could have done the ACS 

operation in one clock cycle. However, 

this would have required six comparator 
chips to find the minimum SM. An increase 

of one clock cycle and the use of three 

comparator chips was chosen to decrease 
the complexity of the design. 

Another clock cycle is used to 

write to the other half of the dual 16x4 

RAM’s. Since all the read and ACS 

operations are pipelined, an additional 
15 clock cycles are required to write the 

15 remaining new SM’s. In the next 
iteration of the algorithm we read from 

where the SM’s were written in the 
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previous iteration and write to where the 
old SM’s had been stored. The process 
then repeats. 

For the ACS circuit, the 
appropriate BM’s must be added to the 

correct old SM’s. Twelve quad 2:1 
multiplexer chips and a copy of the 

convolutional encoder are needed to 
accomplish this task. 

3.3 Survivor Sequence Memory 

The SSM has two tasks. It must 
store the Path Decisions (PD’s) generated 
by the SMC and “traceback” through the 
previously stored PD’s to determine the 

final decoded bits for x 2 and x . This 
requires alternating write and read (for 

the traceback) operations on the memory. 
The traceback depth is the required 
number of PD sets (each set consists of 
16 two bit PD’s) that the SSM must trace 
back through. 

The PD’s must be stored in the 
remaining 16 clock cycles that are 
available. There are two ways this can be 
achieved. Storing two PD bits in each 

clock cycle or storing four PD bits in 
every other cycle, leaving the alternate 
cycle to perform part of the traceback. 

With the first method at least two 

separate memories are required since the 
traceback operation cannot be performed 
simultaneously with the storage of the 
new set of PD’s (due to the design of 

memory chips). Since there is a finite 
amount of memory, the oldest PD set must 
be written over. 

There is usually a point where one 
method is better than the other (in terms 
of the total memory size required) based 
on the number of clock cycles available 
and the traceback depth. A traceback 
depth of around 25 to 30 results in 
little performance degradation [5]. 
Comparing the implementation complexity 
of the two methods, the alternating 
read/write method proved superior. 

With this design only eight clock 
cycles are available to perform a 

traceback. To maintain integer power of 2 
address spaces for the memories (and thus 
efficiently use of practical memory 
designs), a traceback depth of seven is 

used for each SSM memory chip. To achieve 


the required traceback depth, four 64x4 
memories are required. This gives a 

traceback depth of 28. The traceback is 
performed in a pipelined fashion, 
switching between memories when required 
and waiting for the next received set of 
data to continue with the traceback. Four 
separate memories are required since 
there are four tracebacks in operation at 
any one time. 

Since there are no 64x4 RAM’s 
commercially available, larger 256x4 
93422A RAM’s were used. This chip has 
separate input and output data buses 
which simplifies the SSM design. We use 
the state with the smallest SM to start 
the traceback. This is the best state the 

SSM could start with (since it 

corresponds to the path that is closest 

to the received signal) and helps give 

the decoder a slight performance 

improvement over choosing a random or a 
fixed state. The Minimum State Metric 

Selector (MSMS) provides the information 
needed to achieve this. 

At the correct time and pla^e in 

the circuit, the two decoded bits x and 

\ are produced. The two bits are passed 
to the Branch Point Selector (BPS) where 
they are re-encoded to select one of the 

eight 3 bit branch points. The branch 
points are delayed by 34 4D symbol 

periods, 28 due to the traceback, 4 due 

to the pipeline delay in the traceback, 
and 2 due to the re-encoding of the 

decoded data. 

The five decoded bits are then 
differentially decoded (optional) and 
then parallel to serial converted for the 

final decoder output. Precoding and 
postdecoding are optional as there are 

some communication systems that do not 

require phase synchronisation. For 
example, a burst modem can provide phase 
information in the preamble of a burst. A 
74HC4046 PLL is used to generate the 
required five times clock for the serial 

data. This PLL is tuned to lock within 0 
to 2 MHz, but as expected for PLL’s the 
lower frequency limit will be somewhat 
greater than DC. The decoded data is also 
available in five bit bytes. 
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3.4 Signal Set Synchroniser 

The SSS has the task of 
synchronising the decoder to the received 
sequence of 2D symbols. Since the signal 
set consists of two 2D signals, the 
decoder must synchronise to one of the 

two possible ways the received data can 
arrive. 

The decoder is asynchronously 
locked to DATCLK, which is the received 
2D symbol clock whose frequency has been 
divided by two. A delay of zero or one 2D 
symbol periods of DATCLK is used for 
timing synchronisation. 

The SSS works by examining the rate 
of increase of the minimum SM from the 
MSMS. If the rate is high, this indicates 
that the decoder is out of synch and 
needs to be resynchronised. A variable 

threshold in the SSS is used for this 
purpose. If the threshold is exceeded, 

the SSS will toggle into the “arm symbol 
toggel” state. 

If the threshold is again exceeded 

in the next V (V is a variable from 0 to 

63) 4D symbol periods the decoder will 
toggle the 2D symbol delay (from zero to 

one or one to zero). The SSS then ignores 
the decoder for 128+V 6D symbol periods 
to allow the decoder to settle into its 

new signal set configuration. 

If the threshold is not exceeded 
the SSS will “disarm” and return to its 

normal monitoring state. 

4 OTHER DECODER FEATURES 


data. One switch is used to select 

between 3 bit phase (corresponding to 

hard decision), 7 bit phase quantisation, 
5 bit I and Q quantisation, or internal 
loopback mode. The other switch selects 
between signed magnitude, reverse binary, 
straight binary, or two’s complement data 
formats for I and Q received data. 

There are also switches for 
disabling the postdecoder from the 

decoder and the precoder from the 
encoder. The encoder has another switch 
to select between five bit parallel or 

bit serial data. The decoder also has a 
reset button to force all the SM’s to 
zero. The encoder/decoder interface 
diagram is given in Figure 3. 
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The encoder and decoder are mounted 
within a 3U high 19 inch rack. On the 

front panel, two Light Emitting Diodes 
(LED’s) are used to indicate the 2D 

symbol delay. 

To test the decoder, the 2D symbol 
delay can be independently set to manual 
control. In this way, the SSS can be 
isolated from the rest of the circuitry 

so that any problems with the rest of the 
decoder can be fixed without the SSS 
interfering. It can also be used to test 

the SSS by manually introducing delays 

into the received signal. There are two 

switches used for this. 

Two rotary type switches are used 
to select the format of the received 


Figure 3: Viterbi decoder/encoder interface 
diagram for 16 state 2.5 bit/sym 4D-8PSK 
trellis code. 

The 159 integrated circuits of the 
design are placed on two double height 
Speedwire Eurocards (233.4x220 mm). 
Speedwire allows quick and reliable 
connections (if it is done correctly) 

between the chips that can be easily 

changed. The speedwire boards also have 
good groundplanes, critical when 
operating at high clock speeds. The 
Viterbi decoder (which operates at 10 
MHz) is placed on one board (taking 96 
chips) while the encoder, SSS, and 

various interface chips are placed on the 
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other board. 

BNC connectors are used at the back 
of the rack for external data and clock 

connections. It is assumed that all 

received data changes on the rising edge 
of its clock. Similarly, the codec 
produces its signals in the same format. 
TTL 75 Q interface signals are used for 
these external interfaces. 

6 CONCLUSIONS 

A serial implementation of a 
Viterbi decoder for the 16 state 2.5 

bit/sym code with a 4D-8PSK signal set 

has been described. This decoder can 
provide high data rates (up to 2.1 

Mbit/s) and is intended for future use on 
the Hubble Space Telescope. Due to its 
serial implementation the decoder design 
is quite complex, but could be 
implemented on a single VLSI integrated 
circuit. 

The Branch Metric Calculator has 
been implemented through the use of large 
look-up table ROM’s. A VLSI 
implementation may use a Viterbi type 
decoding algorithm to allow single chip 
implementation. 
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